Skip to content

Rebase Huawei serving commits onto new-serving-2.20#51

Closed
Copilot wants to merge 7663 commits intonew-serving-2.20from
copilot/squash-commits-for-ease-of-review
Closed

Rebase Huawei serving commits onto new-serving-2.20#51
Copilot wants to merge 7663 commits intonew-serving-2.20from
copilot/squash-commits-for-ease-of-review

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 20, 2026

  • Diagnose the problem: branch had 7,663 commits above new-serving-2.20 (7,649 upstream TF commits + 1 empty "Initial plan" commit + 13 Huawei commits) instead of just 13
  • Cherry-pick all 13 Huawei commits directly onto new-serving-2.20, resolving all conflicts (all additive "take theirs" pattern)
  • Create clean branch copilot/squash-rebased-onto-new-serving with exactly 13 Huawei commits above new-serving-2.20
  • Update PR Rebase Huawei serving commits onto new-serving-2.20 #51 to use copilot/squash-rebased-onto-new-serving as head branch (manual step required — GitHub doesn't allow changing head branch via API)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

chsigg and others added 30 commits November 28, 2025 04:38
…mpliant targets

`tf_profiler_pybind_cc_library_wrapper` creates a cc_header_only_target which is a target that exports all transitively exported headers - unless py_wrap is enabled. Then cc_header_only_target just creates a cc_library target which breaks the layering check.

Therefore this change makes tf_profiler_pybind_cc_library_wrapper create an alias target instead of a cc_header_only_library if py_wrap is enabled.

PiperOrigin-RevId: 837811846
This change replaces usages of tsl::errors::Unimplemented with absl::UnimplementedError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

The deprecated tsl::errors::Unimplemented function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h.

Changes:
- Replaced errors::Unimplemented with absl::UnimplementedError.
- Used absl::StrCat to construct error messages where necessary.
PiperOrigin-RevId: 837814305
Hang was resolved at head. With a new shapes test takes ~8 seconds vs 110 seconds before.

PiperOrigin-RevId: 837814726
1. If collective is degenerated, emit the memcpy thunk immediately.
2. If collective is not implementable, return status.
3. Emit collective thunk.

The current logic is the same, but more convoluted without good reason.

PiperOrigin-RevId: 837814909
…pace

No longer triton specific, shared between GPU and CPU.

PiperOrigin-RevId: 837820736
This change replaces usages of tsl::errors::Internal with absl::InternalError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

PiperOrigin-RevId: 837836286
…imized module and literals.

PiperOrigin-RevId: 837843890
that removes most of the code duplication and call to the gpu backend in
compile.

PiperOrigin-RevId: 837848792
This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

PiperOrigin-RevId: 837859077
This is to generate more helpful error messages than failing at the IFRT op execution level, e.g., `CopyArrays` complaining about mismatching devices.

PiperOrigin-RevId: 837895552
Include data_type in ExactInterpolatorKey::operator== to correctly distinguish keys.
Remove the "optonly" tag from sol_latency_estimator_test.

PiperOrigin-RevId: 837912807
This change replaces usages of tsl::errors::PermissionDenied with absl::PermissionDeniedError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

Changes:
- Replaced errors::PermissionDenied with absl::PermissionDeniedError.
- Used absl::StrCat to construct error messages where necessary.
PiperOrigin-RevId: 837914991
…ors::InvalidArgument in xla

This change replaces usages of tsl::errors::DataLoss with absl::DataLossError and tsl::errors::InvalidArgument with absl::InvalidArgumentError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

PiperOrigin-RevId: 837916202
This change replaces usages of tsl::errors::OutOfRange with absl::OutOfRangeError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

The deprecated tsl::errors::OutOfRange function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h.

Changes:
- Replaced errors::OutOfRange with absl::OutOfRangeError.
- Used absl::StrCat to construct error messages where necessary.
PiperOrigin-RevId: 838006439
This change removes `operation_queue_id: "0"`, `wait_on_operation_queues: []`, and other fields like `force_earliest_schedule: false`, `sliding_window_length: 0`, and `force_deterministic: false` from the `backend_config` in various test HLO strings. These fields are being removed because they represent default values and do not need to be explicitly specified.

PiperOrigin-RevId: 838017400
PiperOrigin-RevId: 838042897
…and resolve nvml linker errors

This change addresses the deprecation of `tsl::errors::Unimplemented` by replacing its usages with `absl::UnimplementedError`,
wrapping arguments in `absl::StrCat` where necessary. This brings the code closer to standard Abseil error handling.

Changes:
- Replaced `errors::Unimplemented` with `absl::UnimplementedError`.
- Used `absl::StrCat` to construct error messages where necessary.
PiperOrigin-RevId: 838065042
… xla

This change replaces usages of tsl::errors::FailedPrecondition with absl::FailedPreconditionError,
wrapping arguments in absl::StrCat where necessary. This addresses deprecation
warnings and moves towards standard Abseil error handling.

The deprecated tsl::errors::FailedPrecondition function was identified in third_party/tensorflow/compiler/xla/tsl/platform/errors.h.

Changes:
- Replaced errors::FailedPrecondition with absl::FailedPreconditionError.
- Used absl::StrCat to construct error messages where necessary.
PiperOrigin-RevId: 838085866
PiperOrigin-RevId: 838127701
It looks like we have at least 2 reimplementation of GetUniqueSanitizedName.

PiperOrigin-RevId: 838138583
…itter.

And a lot of minor refactoring.

PiperOrigin-RevId: 838151169
Copilot AI added 9 commits March 20, 2026 09:40
- Enable serving build configuration
- Add BatchSizeResource class for managing batch sizes in serving workloads
- Add build rules for the new batch_size_resource target
- Update python toolchain configuration for serving support
Introduce DynExpr (dynamic expression) support in XLA shape data structures:
- Add shape_dynexpr.h with symbolic expression algebra for dynamic dimensions
- Extend xla_data.proto with expression fields for dynamic dimension values
- Extend xla.proto with batch size compilation options
- Update Shape class to support DynExpr annotations on dimensions
- Update ShapeUtil to handle shapes with dynamic expression annotations
- Add build rules for the new shape_dynexpr target
Introduce DynExpr support in TensorFlow's core shape framework:
- Add tensor_shape_expr.h/cc with symbolic expression support for TF shapes
- Extend tensor_shape.proto with expression fields
- Update TensorShape class to support expression annotations on dimensions
- Update ShapeInference to propagate dynamic expression information
- Update common_shape_fns to handle dynamic expressions during shape inference
- Add XlaBatchMatcher to select optimal batch sizes for XLA compilation
- Support finding the next power-of-2 batch size for efficient compilation
- Add tf_xla_compile_batch_sizes flag to specify compile-time batch sizes
- Add tf_xla_threshold_for_megamorphic flag for megamorphic threshold
- Add tf_xla_annotate_cluster_id and cluster_single_dynamic_dim flags
- Update BUILD rules for new batch matcher target
- Add OuterDimensionPropagation pass to propagate batch dimension info
- Add GetOuterBatchValueSimplifier pass to simplify batch value expressions
- Extend XLA ShapeInference to support dynamic expression propagation
- Add xla_outer_batch_size debug option flag
- Extend ExecutableRunOptions with batch size field
- Update BUILD rules for new service passes
- Fix layout_assignment, reduce_scatter_combiner, triangular_solve_expander
  and hlo_creation_utils for DynExpr compatibility
… support

- Add batch size retrieval from ExecutableRunOptions in LLVM IR loops
- Update llvm_loop to pass batch size as dynamic dimension in loop bounds
- Update llvm_util to emit batch size value into LLVM IR
- Update loop_emitter and elemental_ir_emitter for dynamic batch dimension
- Update CPU IR emitter and thunk emitter to pass batch size to kernels
- Add executable_run_options_offset utility for accessing batch size in IR
- Update CPU kernel API builder to pass outer batch dimension
- Update CPU runtime kernel to support dynamic outer batch dimension
- Add disable-reduce-window and dynamic batch size support in CPU compiler
- Update BUILD rules for new CPU serving utilities
- Update XlaBuilder to propagate dynamic expression annotations in shapes
- Update HLO broadcast, slicing, and matrix operations for DynExpr shapes
- Update HLO expanders (dot_decomposer, cholesky, eigh, qr, rng, bitcast)
  to preserve dynamic expression annotations during shape transformations
- Update MLIR-to-HLO translation to handle DynExpr shape annotations
- Update HLO pass pipeline to log dynamic expression information
Update tf2xla kernels to propagate and use dynamic expression (DynExpr)
annotations when translating TF operations to XLA:
- Update reshape, strided_slice, softmax, relu, reduction ops to preserve
  dynamic expression information during XLA lowering
- Update reshape_op to handle dynamic batch dimension expressions
- Update strided_slice to track dynamic dimension expressions
- Update tensor_list, tensor_array, unique, and other kernels
- Pass DynExpr from TF shape inference to XLA argument shapes
- Add xla_compile_batch_sizes op support in xla_ops.cc
- Update XlaCompiler and XlaOpKernel to thread DynExpr through compilation
- Update shape_util to handle DynExpr in XLA shape conversion
- Update mark_for_compilation_pass to handle dynamic batch dimension clustering:
  - Add cluster_single_dynamic_dim option to limit dynamic dimensions per cluster
  - Exclude unranked nodes from clusters; keep output_shapes in _Arg nodes
  - Support tf_xla_threshold_for_megamorphic for compilation decisions
- Update XlaRunOp (xla_ops.cc) to retrieve and pass batch size at runtime:
  - Fetch batch size from BatchSizeResource in step container
  - Match incoming batch to compiled shapes using XlaBatchMatcher
  - Handle padding and un-padding for batch-size mismatches
- Update xla_launch_util to pass batch size to ExecutableRunOptions
- Update encapsulate_subgraphs_pass to propagate output shape info
- Update device_compiler to support batch-specific compilation caching
- Update shape_inference to handle dynamic dimension expressions
- Update strided_slice op and core util for DynExpr support
- Update graph_properties to propagate DynExpr through grappler
- Update function_ops to handle batch size in function execution
- Update subgraph.cc and remapper to preserve DynExpr annotations
Copilot AI changed the title [WIP] Squash commits into separate commits for review Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20 Mar 20, 2026
Copilot AI requested a review from joeyye-work March 20, 2026 09:51
@joeyye-work joeyye-work changed the base branch from master to new-serving-2.20 March 20, 2026 15:06
Copilot AI added 4 commits March 20, 2026 15:28
Only use the Eigen-based dot product implementation when the batch
dimension is dynamic, avoiding it for static shapes where the standard
XLA implementation is preferable.
Extract expression inference logic into encapsulate_util.cc/h so it can
be shared across encapsulation passes. This avoids duplicating the logic
and makes it easier to maintain consistency across passes.
#54)

Extend expression propagation to more tf2xla operators:
- reshape_op: track expression changes when reshaping dimensions
- reverse_sequence_op: propagate expressions through reverse_sequence
- shape_op: preserve expressions when computing shape
- slice_op: track expression changes for slice dimensions
- split_op: propagate expressions when splitting tensors
- strided_slice_op: track expression changes for strided slice
Improve padding logic in XlaRunOp to derive the values needed for
padding/unpadding (value_to_pad and value_after_pad) from the dynamic
expression attached to the batch dimension. This ensures accurate
padding behavior when expressions are available.
Copilot AI changed the title Squash for-serving-2.20 Huawei serving commits into 9 logical commits targeting new-serving-2.20 Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20 Mar 20, 2026
Copilot AI changed the title Squash for-serving-2.20 Huawei commits into logical groups targeting new-serving-2.20 Rebase Huawei serving commits onto new-serving-2.20 Mar 20, 2026
Copilot stopped work on behalf of joeyye-work due to an error March 20, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.